{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4e2a6d7c",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "We will use the Transformers library from HuggingFace which is pip-installable:\n",
    "\n",
    "pip install transformers\n",
    "\n",
    "You'll also probably want to use PyTorch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a370a66a",
   "metadata": {},
   "source": [
    "## Exercise 1: Prompt Engineering 101\n",
    "\n",
    "The aim of this exercise is to understand how prompt structure affects LLM outputs.\n",
    "\n",
    "1. Use transformers.pipeline to interact with gpt2 or mistralai/Mistral-7B-Instruct-v0.1\n",
    "\n",
    "2. Try the three different format prompts:\n",
    "   * Instructional: \"Write a poem about data science in astronomy\"\n",
    "   * Conversational: \"What can you tell me about data science in astronomy\"\n",
    "   * Completion-style: \"Data science in astronomy is the field of...\"\n",
    "   \n",
    "3. Vary the temperature of the model and the top_k/top_p settings and see what effects this has on the outputs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7de00d0b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a2c455d9",
   "metadata": {},
   "source": [
    "## Exercise 2: Text Classification with AstroBERT\n",
    "\n",
    "The aim of this exercise to use a pre-trained LLM to classify astronomical texts.\n",
    "\n",
    "1. Create a data set of 20 random sentences from abstracts on arXiv (https://arxiv.org/list/astro-ph/new)\n",
    "\n",
    "2. Using the AstroBERT model (\"EleutherAI/astroBERT\") classify these sentences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6131a76a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1a1bcd38",
   "metadata": {},
   "source": [
    "## Exercise 3: LLMs as Zero-shot Annotators\n",
    "\n",
    "The aim of this exercise is to repurpose LLMs to do zero-shot labeling.\n",
    "\n",
    "1. Using the data set of sentences you created in Ex. 2, use a zero-shot classification pipeline to create label topics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95200854",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "0340c335",
   "metadata": {},
   "source": [
    "## Exercise 4: Text Summarization and Evaluation\n",
    "\n",
    "The aim of this exercise is to assess how well LLMs summarize text.\n",
    "\n",
    "1. Take a few abstracts from arXiv and run them through a summarization pipeline using AstroBERT as the model.\n",
    "\n",
    "2. Compare the output with using model=\"sshleifer/distilbart-cnn-12-6\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d486fc7",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}